Skip to content

Prompt caching with application inference profiles#281

Merged
rstrahan merged 2 commits intodevelopfrom
fix/enable-cachepoints-for-inference-arns
Apr 14, 2026
Merged

Prompt caching with application inference profiles#281
rstrahan merged 2 commits intodevelopfrom
fix/enable-cachepoints-for-inference-arns

Conversation

@rstrahan
Copy link
Copy Markdown
Contributor

@rstrahan rstrahan commented Apr 14, 2026

Issue #, if available:. #272

  • Prompt caching with application inference profiles — Fixed <<CACHEPOINT>> tags being stripped when using Bedrock application inference profile ARNs as model IDs. The cachepoint check now resolves inference profile ARNs to their underlying foundation model via the GetInferenceProfile API, enabling prompt caching for profiles that wrap supported models (Claude, Nova). Results are cached to avoid repeated API calls, with graceful fallback if the API call fails. (#272)
    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Also added application-inference-profile/* ARN pattern to bedrock:InvokeModel IAM policies across all templates (root, appsync, multi-doc-discovery, and sample templates). PR #236 previously fixed only patterns/unified/template.yaml; this completes the fix for all Lambda execution roles. Also added bedrock:GetInferenceProfile read permission to support prompt caching resolution. (#272)

@rstrahan rstrahan changed the base branch from main to develop April 14, 2026 19:11
@rstrahan rstrahan merged commit a5f0224 into develop Apr 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant